Almathera Ten Pack 2: CDPD 1

home *** CD-ROM | disk | FTP | other *** search

/ Almathera Ten Pack 2: CDPD 1 / Almathera Ten on Ten - Disc 2: CDPD 1.iso / pd / 101-125 / 102 / match_stuff / mat.doc < prev next >

Wrap

Text File | 1995-03-13 | 47KB | 1,082 lines

============================================================================ || || || MAT -- p.n.; poss. abbrev. of "Match"; also "Matte" [Motion || || Picture Arts]: means of cutting, inserting and superposing || || disparate items. || || || || ------------------- || || || || This program provides a flexible string-matching and substitution || || mechanism for both text and filenames. It will probably be most || || useful within command script files to extend the operations || || possible with AmigaDOS. The matching scheme is an extended || || version of the standard AmigaDOS pattern-matching convention, || || with the added features of negation and "slicing" of matched || || strings. || || || || * Searches for patterns within text files. || || || || * Rearranges text within matched lines to || || create new files. || || || || * Searches directories for matching file names. || || || || * Creates Command Script Files using the whole or || || parts of matched filenames. || || || || || || -- Copyright (C)1987 Peter J. Goodeve -- || ============================================================================ -- by Pete Goodeve -- August 1987 This is one of those programs that is probably overkill for most purposes, but on rare occasions may do something that no other program can. It grew out of some long-time desires I've had while working with the Amiga (...and grew... and grew...). First, I wanted to be able to search text files for more than just simple strings: I wanted full pattern-matching ability. Second, I wanted to be able to do things equivalent to this sort of "impossible" command: "RENAME myfile#? AS myfile_backup#?" Eventually, some tortuous evolution led to a single program, "Mat", that combines both facilities into one (because 90% of the code was common). Along the way, I extended Richards' original matching algorithm -- recoded in C from BCPL -- to handle negation patterns ("don't match the string if this pattern matches"), the "*" as an optional alternative to "#?", and the marking of "slice points" in the matched string so it can later be cut into pieces and spliced with other text. The resulting program is much more flexible than "Search" or any of the file matching commands, but it is not by any means a full language like, say, "awk". Because of its flexibility, the command line syntax can get a little involved compared to other DOS commands, so it is probably easiest to use within prepared script files. With a command syntax like that, of course, Mat can't be invoked from a WorkBench icon, only from a CLI. A couple of things I should also mention at this early opportunity. In distinction to "Search" -- which is a simple string search -- this is a line-matching program. In other words the pattern you specify must match the whole line; if the pattern you are looking for can occur anywhere on the line (an "unanchored" pattern) it must be preceded and followed by wild-cards ("#?" or "*"). You will also find that pattern matching can be SLOWW! On a simple (unanchored) string search it is about half the speed of "Search", and each alternation you add increases the time by about the simple-search time: it is doing a lot of work on each character in the file. On the other hand, anchored matches can be quite fast, because each line scan can quit at the first failure. You may find it preferable to RUN the program as a background process; it works nicely with "pipes" if you have them installed. The program can essentially operate in two different modes: 1) it can scan text files for lines matching a specified pattern; 2) it can scan directories for filenames matching patterns. In either case matched strings can either be output unchanged or they can be cut up (according to "slice marks" in the pattern) and rearranged -- possibly with added text --on the basis of a supplied "template". As well as pieces of the source string, things like line-count and current filename can be spliced into the output. (There are actually two more modes. In one you can perform the same sort of operations on literal arguments in the command line; this is only really intended for the rare occasion you might want to do some complex test or operation on a command script argument. The other mode sends the contents of the specified files in sequence to the standard output with no processing at all -- it's a little more flexible than JOIN.) All control is from the command line; a specific syntax -- possibly including keywords -- determines each form. Before getting into a complete description, some specific examples might give a flavor of what it can do. MAT *(word|another)* myfile Mat scans "myfile" for lines containing "word" or "another", and displays them on screen. (Or of course you could redirect them to a desired file with the ">" convention.) Note the <any-string> markers ("*") surrounding the main pattern so that it is unanchored. MAT (W|w)ord* myfile looks for "Word" or "word" at the beginning of a line only. The match is normally case-sensitive -- although as we'll see later there is a keyword to switch this off. MAT "#?(word|another||another word)#?" myfile In this case Mat will look for "word" and "another" as before, but will NOT match a line containing the string "another word". The double vertical bar is one of two ways of signifying negation. Notice that quotes around the pattern are necessary now because of the imbedded space; I also used the alternative way of specifying <any-string> ("#?"), partly for demonstration and also because it may be a good idea to avoid the asterisk inside quotes as a general principle (it is itself a quoting character in certain circumstances). MAT *^(word|another)^* "^1: ^0----^2" myfile This time the pattern includes "slice marks" ("^"), so it is followed by a template indicating how the pieces are to be output. "^0" is the segment of the line matched by the part of the pattern before the first slice mark, "^1" is the segment between the two marks (i.e. "word" or "another", depending on which was matched), and "^2" is the rest of the line. Thus, if the input line was "this line contains the word we are looking for" the output would be "word: this line contains the ---- we are looking for" (By the way, in this document "^" always means the caret character -- NEVER the control key!) MAT *(word|another)*^ "^F: ^O" my#? :T/#?.txt Here Mat is doing the same search as in the first example, but we have included a slice mark to force the next argument to be a template. This time, instead of just searching "myfile", we are looking at ALL files beginning with "my", and also all files in the directory :T that end in ".txt". The "^F" code in the template outputs the current filename, and "^O" ("Oh" -- not "zero") outputs the whole Original line. MAT >RAM:rn FILES "rename ^F as ^0_old^1" #?^.c In this final example we can see how to achieve my dream of renaming multiple files. The output of Mat is directed to the temporary file "RAM:rn" so that we can then EXECUTE it as a command script to do the actual renaming. The keyword "FILES" indicates that the search is to be for filenames, rather than the text they contain; in this syntactic form the keyword must be followed by a template to generate the output line. In this form, we can put slice marks into the file pattern, so no separate main pattern is needed. Within the template, "^F" refers to the original filename as before, and the new name will have "_old" inserted at the slice point (immediately before ".c", as specified in the file pattern). If you got completely lost in the foregoing, please read the full manual that follows and try again. First will be a description of the command line syntax for the various modes, then a discussion of pattern matching in general, followed by a description of the template format; the last section will detail the ways you can specify filename patterns. %%%%%%%%%%%%%% Command Line Format ___________________ Text-line Matching Mode: This mode searches text files for lines which match the supplied pattern. The command format is: MAT <pattern> <file-specifier>... where <pattern> is any pattern that does not contain slice-marks. The following arguments (at least one, but as many as you like) specify the text file(s) you want to search for the pattern; they may be simple file names (in the current directory), path names to files in another directory, or patterns that specify sets of matching files. See the section on File Name Patterns for the full range of possibilities. Each line found in the text files searched that matches the pattern will be sent to the standard output -- normally the screen, but it may be sent to a file or device with the standard DOS redirection operator (">"). The file being searched will normally be a standard text file, with each line terminated with a newline character, but non-text characters cause no problem, and it is not fatal if newlines are missing. If 256 characters are read before a newline is reached these will be treated as a line, and the scan will resume with the character following; if the string is output, though, it will have a newline added at the end, unless the NOLINES keyword is used (see keywords section). By default, text searches are case-sensitive: "A" does not match "a". You can ignore case in a search by including the NOCASE keyword in the command line. (Keywords are not shown in the basic command format to avoid complexity and confusion. Discussion of their general use has its own section below.) Value returned to the CLI: When Mat returns to the CLI, it passes back a value of zero if it has found at least one match. If it has found no matches at all it returns a "Warning" value of 5. This happens in all modes, and can be tested by a command script to see if the intended operation has been successful. If you should just want to know if a match exists, without needing to see any output, you can simply redirect this to NIL:. If Mat encounters an error which prevents it from continuing, like an incorrectly formed pattern, it will return at once with an error code of 20. Slice'n Splice Mode: This mode searches text files for matching lines, but instead of simply outputting matched lines, the lines found can be cut into pieces according to "slice-marks" in the pattern; the output lines are built from these and other items under the control of a template argument. The command format is: MAT <slice-pattern> <template> <file-specifier>... where <slice-pattern> is any pattern that contains at least one slice-mark ("^"). It must be followed immediately by the template argument that determines the format of each output line; these can be generated for both matching and non-matching source lines. File specifiers are the same as in the previous mode. For matched lines the template can rearrange the sliced pieces of the text and embed other constant text or such things as line number and current file name. For lines that don't match, the original line, fixed text, line number and so on can be output. Whatever its contents, each output string always ends with a newline, unless the NOLINES keyword is used (see below). For details of the template format see the later section on the subject. Don't forget that if you use a template you MUST include at least one slice mark in the pattern you supply (even if you don't actually want to cut the matched line up). Otherwise the program will get very confused. Keywords: Variations on the above formats are controlled by keywords in the command line. In general, these may be placed either before the pattern argument or intermixed with file specifiers; they must never be put between a pattern and its template. The exact effect may depend on where on the command line they are placed; in many situations you could have several interspersed along the line. Mat always processes the command arguments in sequence, from left to right (unlike the position independent keywords of AmigaDOS commands). There are eleven possible keywords in this release of Mat (two of which are just shorthand for others): NOCASE, CASE, FILES (or F), STRING, JOIN (or J), FIRST, ALL, NOLINES and LINE. They may be in upper or lower case as convenient. The keywords FILES (F), STRING, and JOIN (J), set the mode of operation of the program; it is possible, but probably not sensible, to change the mode in the middle of a command line; there is no keyword to restore the default text search mode. The other keywords may be used in any mode where they are appropriate. NOCASE causes all subsequent searches to ignore the case of pattern and text characters. It can be put anywhere in the command line subject to the above restrictions; file specifiers appearing before it will not be affected. CASE cancels the effect of a previous NOCASE on the line; as this is the default, you probably won't need it very often. FILES (or its shorthand alternative F) selects directory filename search mode (see next section) rather than the default text file search. It may change the command syntax: if it is the first argument, it REPLACES the usual pattern argument, and MUST be followed by a template. You may also place it after a slice-pattern/template pair; it is even permissible to put it between file specifiers if for some odd reason you wished to mix the two types of searches, but this is not recommended. STRING selects a mode where the pattern is matched against the literal argument strings that follow on the command line. Either a simple pattern or sliced-pattern/template pair can be used. This mode is intended primarily for EXECUTE command scripts where you might want to test that a supplied argument satisfied some pattern constraints, or slice and rebuild it in some way. You could also use it from the keyboard to watch the effect of a particular pattern on various strings. JOIN (or its shorthand J) needs neither pattern nor template (and both MUST be omitted if the keyword is placed first). It causes all the files that match the specifier arguments to be sent in sequence to the standard output. No matching or other processing is done on the contents (and these may be anything -- not necessarily text). FIRST is only appropriate in text matching modes. It causes the search of each file after it on the command line to terminate when the first match is found. It is useful when you just want to determine which files contain a pattern, rather than listing every occurrence. It is compatible with templates and other options. ALL reverses the effect of FIRST if you should need to do so within a command line. It will probably never be needed. NOLINES prevents the usual newline character being output after each match. All subsequent matches will be shown on the same line unless the template dictates otherwise. Don't forget that you will usually want some sort of separator in the template, such as a space. It can be used in any mode. LINE reverses the effect of NOLINES if this has been given previously. (Apologies for the plural/singular disparity, but it isn't quite the inverse.) It also inserts a newline into the output at that point; you can use it just for this if you want an extra blank line between file specifiers. If for some odd reason you should have to specify a pattern that is exactly one of these keywords you can easily distinguish it by putting parentheses around it or appending the null-string-match character "%". Filename Search Mode: This mode searches for filenames which match the supplied specifiers; it does not examine the contents of the files. A template is always required to specify the form of the output. There are two command forms for this mode: MAT FILES <template> <file-specifier>... MAT <slice-pattern> <template> FILES <file-specifier>... In all cases, the abbreviation F can be used instead of FILES. In the first form, the FILES keyword replaces the usual pattern argument; it must be followed by a template. This simply finds all files which match the supplied specifiers. Slice marks may be included in the filename part of a specifier (but not in the path part); if they are present they will be recognized by the template, but they are always optional. In this particular form, the template does not require that slice-marks be present. As in other filename searches, the match never pays any attention to the case of either pattern or filename. In the second form, both a slice-pattern and a template must be present. Any filenames which match the specifiers will then be matched against the main pattern and the appropriate action taken; any slice marks in the specifiers are just ignored. There are two notable effects of this two-stage matching. First, by default the final stage is case-sensitive -- though the NOCASE keyword will reverse this. Second, you can output lines for names that DON'T match in the second stage, as well as ones that do. Literal String Match Mode: This mode tests string arguments in the command line against the pattern. The match can either be simple or sliced with a template. Possible command forms are: MAT STRING <pattern> <string-argument>... MAT STRING <slice-pattern> <template> <string-argument>... The STRING keyword could also occur after the pattern or pattern/template pair. Putting it after one or more file specifiers would change modes in the middle of the command; it is remotely possible that this might be useful. Output from this mode is just as in the other modes, and as usual it will return the value 5 to the CLI if all matches fail. File Concatenation Mode: With this mode, you can join multiple files into a single stream sent to the standard output. You can use it as a multiple "TYPE" command, or -- if you redirect the output to a file -- as a "JOIN" command that handles patterns. MAT JOIN <file-specifier>... You may use J as an abbreviation for JOIN if you wish. No pattern or template should be included. The program pays no attention to the contents of the files: they are simply treated as byte streams. ======== Pattern Matching ________________ The pattern matching algorithm used by Mat is an extension of the standard file pattern matching scheme used by AmigaDOS. Many people may not appreciate how general and flexible the method is. It is many times more capable than the simple "wild-card" matching available on most personal computers. There are some things that the standard algorithm doesn't have which would often be useful, and I have done my best to supply some of these in this extended version. The discussion that follows may be a fuller exposition of how to use pattern matching than is available from other sources. If you leave out references to the "universal-match" character "*", "negation matches", and "slicing", everything discussed applies just as well to standard AmigaDOS patterns, which can be used in commands like LIST, DELETE, and COPY. A pattern is a text string constructed from "plain characters" and "special characters". It represents a set (possibly a large set) of text strings that will match it. Remember that it always matches complete strings; this is not the same as a simple text search, where a match is signalled if the search string is found anywhere within the source text. The string being matched by the pattern is always "bounded" in some way, either because it stands alone -- like a file name -- or because, say, it is a complete line of text. The newline character at the end is not usually available to the matching process. If a pattern argument in a command line contains spaces, it must of course be enclosed in quotes. There is no way of including quotes in a pattern which is itself enclosed in quotes, unfortunately, (because of the way C handles argument strings). The syntax of the pattern structure is such that complex patterns can be built from simple ones. Broadly speaking, patterns may be chained end to end so that successive segments of a complete target string may be matched by successive segments of the pattern. In addition, each pattern segment can specify "alternatives": if any of these match, the whole segment matches. Plain Characters: The simplest pattern is a string of plain characters. This will only match a target string consisting of exactly the same characters in the same order, which is obviously of limited usefulness. The only case where you are likely to want this is when getting a particular file name, and the program is smart enough to go directly to the file in this case rather than doing a search. Special Characters: To build more general patterns we need the special characters. These do not represent themselves (unless special action is taken): they are instead structural elements that form the structure of the patterns we desire. Using them we can build patterns -- or subpatterns -- that will match, say, any single character, any five characters, any arbitrary string, or a string that is one of several possible specific alternatives. We can then put such subpatterns together to end up with a complete pattern that will match all the various possibilities we are looking for and no others. The possibilities should become clearer as we get to specific examples. The seven special characters used in AmigaDOS file matching are: ' ? | ( ) # and % To these Mat adds two more: ~ and ^ We'll look at them briefly in order, before we get into a fuller exploration: " ' " makes the character following it into a plain character. " ? " matches ANY single character. " | " separates alternative patterns. " ( " and " ) " enclose patterns used in building larger ones. " # " causes a match to any number of repetitions of the pattern it precedes. " % " matches the null string when syntactically necessary. " ~ " is one way (of two) of sprecifying negation. " ^ " slices a matched string into segments. Quoting Characters: The single quote (" ' ") is used to turn any special character immediately following it into a plain character. Thus to match against an actual question mark in a target text you would include the pair " '? " in the pattern. And of course it can quote itself. Matching Any Character: The question mark matches ANY single character. Thus: ??? matches "abc", "xyz", and so on, but not "ab" or "abcd". Matching Alternatives: The vertical bar (" | ") separates "alternatives". If any of a set of patterns separated by bars matches the target, the match is successful. For example: abc|def|qwertyuiop would match any of those three strings, but no others. The pattern abc|x?z would match "abc" or "x" and "z" separated by any single character. Building Patterns from Others: The left and right parentheses can be used to enclose a pattern that you want to match as a unit when it is part of a larger pattern. As one example we could look for any two characters followed by "abc" or "def" with the pattern: ??(abc|def) Combine two or more patterns in sequence this way: (abc|def)(xxx|yyy) This will match "abcxxx", "abcyyy", "defxxx", and "defyyy". Patterns can be nested as far as you like with parentheses: a(bc|??(xx|yy))d will match "abcd", or any six-letter group beginning with "a" and ending in "xxd" or "yyd". Redundant parentheses do no harm. They may be useful to distinguish patterns from other constructs. Pattern Repetition: The " # " character is always followed by a (sub)pattern. It will match ANY number of (exact) repetitions of that pattern (INLUDING zero). The pattern may be a single letter, but if it isn't it must be enclosed in parentheses. Thus: #(ab) matches "ab", "abababab", or simply an empty string. It does not match "ababa". Ther pattern to be repeated may be any legal pattern, including more repetition constructs if you want: #(ab|?x|#(xy)z) will match such strings as "abab", "zxab", "qxxyxyxyxyzxyab", and so on. It will NOT match "abxy". Matching the Empty String: The " % " character is used where you have to specify an empty ("null") string -- normally as one of a number of alternatives. The construction (|abc) is not legal; instead you must use: (%|abc) which will match either "abc" or the null string. Negated Matching: Mat extends the basic pattern matching syntax by allowing you to specify patterns that if matched will cause the overall match to fail. If a negated segment is included in a pattern, and the target string has ANY POSSIBLE match of the whole pattern that includes that segment, the match cannot succeed. There are restrictions on negation patterns not shared by the structures we've talked about up to now; in particular they can't be nested -- you can't negate a negation -- although they can be inserted at any level in the pattern. There are two ways of specifying negated patterns. The first will match ANY string UNLESS it exactly matches the pattern; it is constructed by prefixing the pattern by the tilde (" ~ "): ab~(cd)e will not match "abcde", but will match any other string that begins with "ab" and ends with "e", such as "abxxxe", "abe", "abce", etc.. The second form is a "negated alternative", indicated by two adjacent vertical bars (" || "). This is used when, rather than matching ANY string that is not the negated one, you have a set of patterns you want to match UNLESS the negated part is also matched. Thus: a(b?d|?c?||bcd) will match four character strings such as "abxd", "accc", "abcx", as long as the whole string is not "abcd". You can have more than one negated segment, as long as one does not appear inside another. Thus the following sort of thing is possible (whether it's also useful though...?): a~bc~(de)(???||fgh||xyz) Remember that this will be forced to fail if there is any possible match that includes a negated section, but on the other hand the tilde construction matches any string that is not exactly the one specified. Thus these will succeed: acxxx abbbcddeabc acdexy and these will fail: abcxxx abbcdexxx aczxsdefrgthcjxsxcxyz Slicing the Matched String: If it is appropriate to the function of the program, you can include "slice marks" (the caret -- " ^ ") in your pattern to select out pieces of the matched string that can be treated individually. The way these pieces are accessed is not the concern of the matching procedure; in the case of Mat, the template argument provides ways of referencing them. Once again there is a restriction on the use of this character that does not apply to the others: only the first four of these marks encountered during a match will be recorded; any after this will be ignored. Note that this doesn't mean you can only include a maximum of four marks; if they are inside alternatives that don't match any part of the target string, the scan will never encounter them. You should be sure of what you are doing, though, if you don't want to be surprised by the program's choices. We'll return to this, and some other points you should note about the behavior of slice marks, later. If there is more than one possible match of the pattern to the target, the slice will be made at the earliest possible point. Remember this especially when you have repetitions in your pattern. Examples: The pattern #?^x#? will cut abcdxyz into abcd xyz It will also cut abcxxxx into abc xxxx The pattern #?^x#?y^#? will cut abcxxxxyz into abc xxxxy z The pattern #?^#x^#? won't cut much of anything! (because "#x" also matches the null string.) The first two slices will simply always be empty, and slice three will contain the whole string. The pattern #?^(word|another)^#? will cut "here is another word for you" into "here is" "another" "word for you" (using quotes in this case to mark off the slices). Notice that the cuts are made around "another" rather than "word" because the earliest match is found. Slice marks within alternatives can be used, as noted above, but are tricky. Because of the way the marks are recorded internally, if two different alternatives containing them match, both marks will be reported but the position of one of them will be wrong (probably at the beginning of the string). So it is best to keep the slice marks outside of any alternation constructions (as shown in the last example above). Templates _________ The Templates Mat uses to generate output lines are basically simple text strings with "splice-markers" that indicate where the pieces of the matched string and other items are to be inserted. The text segments if a template can be anything you want (except a newline -- there is a marker for this). A special marker can be used to divide the template string into "success" and "fail" halves; the "success" part controls the format of output lines for matches, while the "fail" part will be output for each input string that doesn't match. Output strings are always terminated with a newline. Each marker is a character pair: the caret (" ^ ") followed by a selector character. Slices from the matched string are numbered -- "^0" to "^4" . Other items have identifying letters, such as "^N" for line number; the case of these letters is important (all are currently upper case because you are already holding down the shift key for the caret). The success/fail divider uses the vertical bar: "^|". Not all selectors are valid under all conditions. For example you can't use slices in the "fail" section of a template because -- obviously -- there aren't any. Line numbers, on the other hand, are only appropriate in text matching, not in file name matching mode. If you use a selector that is not valid it is simply skipped over. Of course you can use any selector more than once within a template. If a template argument in a command line contains spaces, it must of course be enclosed in quotes. As with a pattern, you can't include quotes in a template which is itself enclosed in quotes: use the "^Q" selector instead. Slice Selectors: As four slice marks are allowed in a pattern, there can be a maximum of five slices of the matched string. These are selected by "^0" for the piece from the beginning of the string to the first mark, "^1" for the piece between the first and second, up to "^4" for the remainder of the string beyond the fourth mark. If there are fewer than four slice marks, the slice associated with the final existing mark extends to the end of the string, and all higher-number pieces are empty. Thus if there are only two marks, "^2" covers the remainder of the string, and "^3" and "^4" are empty. For instance, if we use this pattern, with two slice marks: #?^word ^#? and this template -- which will omit slice 1: ^0^2 to match and rearrange the string: "this word will be missing" we will end up with: "this will be missing" Line Number Selector: The selector characters "^N" placed in a template string will insert the current line Number within the file being scanned at that point in the output string. The number is always five digits, with leading zeros visible. At the moment there is no option to suppress leading zeros, partly because it makes it easier to line up columns and partly because I wanted fast code there [and don't forget the laziness syndrome, Peter...]. It can be used in both "success" and "fail" portions of a template. So the pattern #?^(word|another)^#? and template ^N: ^1 would generate something like 00234: another Index Number Selector: The pair "^I" inserts an Index number representing a count of matches so far. The count is kept from the beginning of the program, and is not reset with a new file. It also works in file name matching mode. You may use it in the "fail" section of a template, but remember it will indicate the number of matches, not lines output. The format is the same as "^N". Original String Selector: The pair "^O" ("Oh", not "zero" -- I probably should have chosen a better one...) represents the unsliced Original string. It can be used in both the "success" and "fail" parts of a template. Thus, to simply put a line number in front of each matched line, you could use the template: ^N: ^O In File Matching mode, this selector is the same as "^F" (below). Line Break Selector: The pair "^B" Breaks the output line at that point with a newline character. For instance, to output line number and slice-1 on one line, followed by the original string on a new line, use: ^N: ^1^B^O Quote Mark Selector: It is not usually possible to embed quote marks in template strings directly, so you can use the selector "^Q" to make them appear at that point in the output line. ^0 ^Q^1^Q ^2 File Name Selector: "^F" selects the local name of the current File (i.e without any directory prefix), in both text and file name matching modes. For example, if you have a filename specifier argument (see later) :work#?/#?.txt which has found the file Work Disk:work_1/sample.txt the "^F" selector will insert sample.txt Directory Path Selector: "^D" in a template will insert the path to the Directory of the current file, as seen by Mat, based on the specifier argument it is using. The exact form of this string depends on the way you have formed the directory part of the file specifier argument. It does NOT always contain a complete path from device to file. If you are just assuming the current directory the string will be empty. If the file is in a different directory it will show the chain of directories it has used to reach it, using the full directory names it has found. Some specific examples: If we used the file specifier as in the previous section, and found the same file: Work Disk:work_1/sample.txt ^D would insert Work Disk:work_1 On the other hand, if the specifier was work#?/#?.txt ^D would give work_1 You should especially note that if you did not use a device specifier (":") in the first section of your specifier, yet the first directory in the chain IS in fact a root directory, you will see a slash "/" separator rather than the colon in the string supplied by ^D. Thus if your specifier happened to be /work#?/#?.txt the Directory path would be shown as Work Disk/work_1 Failure Template Marker: A simple template is only applied to strings which have been matched, and nothing is output when there isn't a match. You can split the template, however, into two subsections with the special success/fail division marker "^|". The section preceding this mark is applied for a successful match just like a simple template; the section following it is used if the match fails. In the "fail" section, any selectors desired can be used, except the five slices "^0" - "^4". A simple use would be to output all lines, whether or not they matched, but mark or rearrange the matched lines in some way. For example the following would output them all but put a marker and index number on each matched line (and corresponding blanks before an umatched one): MATCH[^I]> ^O^| ^O File Specifiers _______________ The arguments in the command line you supply to specify the files that Mat will examine are really just like those you might give to any AmigaDOS command, but there are one or two extra features. For text file searches you will probably most often want to specify a single file. You do this in the usual way with either the local name of a file in the same directory, or a path name that includes the chain of directories needed to reach that file in another. In place of the simple file name, you can use a pattern to match a group of files in the same directory. Unlike other AmigaDOS commands this pattern can employ the extended matching features described above ("*", "~", and "||"). Slice marks can also be used where they are appropriate (see below). You can also use patterns in the directory part of the specification, in just the same way as in the filename part. (Did you know that you can also do this in most AmigaDOS commands supporting patterns, such as DELETE?) All the directories matching that specification will be searched in turn. However, you cannot split a pattern across directories -- in other words, a pattern must not include a device or directory separator (":" or "/"). This means that a given pattern can only match directory names at a certain "level" in the file hierarchy of the disk. Also you cannot use a pattern in a device specifier -- these must be simple names. To search more than one level, or more than one device, you must have more specifier arguments. In File Name Search mode, if you don't supply any other pattern, you may put slice marks in the file name portion of the specifier. You cannot place them in the directory part. Except in this particular situation, with no main pattern present, slices in the filename will be ignored. Examples: These are valid file specifiers: myfile.txt my#?file(.txt|_bak) df1:work/myfile :work/myfile /work/myfile :(work|old)/my^#? These are not: df(0|1):#?/#? -- pattern in device part df1:/#(work/)myfile -- pattern includes directory separator :w^#?/my^#? -- slice mark in directory part The Real World ______________ Even before it had reached its current form, this program was put to good use a couple of times. One case needed the text match facility, the other the filename match. I had a documentation file for another project, written in "proff" text formatter format, and I wanted to convert it to "troff" for typesetting on a Un*x system. I ran it through Mat several times, first simply to locate all the formatter commands -- a simple job because they are all lines beginning with a period --, and then to actually rewrite some of the commands in troff form, slicing up the original commands and reusing the appropriate parts. I was even able to generate some added commands to create indented paragraphs and so on. The other application also involved text files, this time an article I had written for a newsletter. The disk I passed on to the magazine's editor already had two versions of the text, and he had to cut it further into pieces for the layout program. So when it came time to put Part 2 on the same disk, I had all these old files -- which I didn't want to throw out -- cluttering the top level of the disk. Of course I could have simply copied them to a new directory and then deleted the originals, but for one thing that would change their date. Better to rename them all to be in a new directory, except that without Mat I would have had to do each one individually. The commands I used were something like: makedir old mat >ram:rn f "rename ^F as old/^0_Part.1_^1" article^#? execute ram:rn (except that I was using Sili(Con:), so I could type simply "ram:rn", rather than "execute ram:rn"...). As another example, I have a little command script file I call "ref", which searches for the pattern given as its first argument in the files corresponding to the following ones, and prints out matching lines with the match itself highlighted: .K PAT,FILE1,FILE2,FILE3 Mat "#?^<PAT>^#?" "^0<esc>[1;33;40m^1<esc>[0;31;40m^2" <FILE1> <FILE2> <FILE3> Notice that the pattern argument you supply is automatically surrounded with universal matches and slice marks. If you aren't familiar with the strange strings in the template, these are ANSI control sequences recognized by the console device to change text color; <esc> is the ESCape character. To create a script file like this of course you'll need an editor such as EMACS that can handle <esc> as a character. Here is a short script that is convenient for sending multiple files to your printer; it is a little better than PRINTFILES (the 1.2 Extras command) in that it allows filename patterns (and is configured to run automatically in the background): .K f1,f2,f3,f4,f5 .bra { .ket } ;*** caution -- the next line is too long for most printers! *** run mat >t:_pr f "cd ^Q^D^Q^Btype >prt: ^F^Becho >prt: ^Q^Q" {f1} {f2} {f3} {f4} {f5} + execute t:_pr + delete t:_pr It also shows up some of the deficiencies of the current version of Mat (as elaborated in the next section): for example it is better to CD to the "^D" directory because it is not always possible to concatenate directory and filename (if "^D" is a device for instance); also if you aren't careful in specifying pathnames, "^D" may not be a proper directory identification string -- see the discussion on "^D" in the Template section. Deficiencies and Prospects __________________________ Mat is obviously missing some things in its current incarnation. When and whether they get added will depend both on my own further needs and moods and on your feedback. (I'd be delighted with monetary contributions, but this is for you to use and distribute anyway...) The matching algorithm, even though extended over the original, is in need of a couple more options. Un*x "regular expresssions" have a way of specifying sets and ranges of characters, and while I don't think Mat needs sets -- alternation handles this reasonably well -- some way of specifying ranges would be a big advantage. I have often wanted to search for "any letter", say. I think the best way to implement this would be to define specific "range selectors" such as "any letter", "any digit", "any upper case", and so on, probably using a convention like "\a". The selectors available for template use could also be expanded. In particular there is an obvious need for a true directory path, rather than the one that Mat currently assembles from the individual directory names it has traversed. Any other suggestions? Keywords are easy to add, so... Oh, you noticed -- yes, well their number did grow rather rapidly, but there are still a few I would like to add. HEAD and TAIL would be followed by template-type arguments that would generate output at the beginning and end respectively of each file. TITLE would do the same thing wherever it occurred in the command line. These would be especially useful in NOLINES mode. The matching algorithm is slower than I would like. Rewriting it in assembly would doubtless help, but there is also a need for a "quick match" feature for simpler patterns closer to those that SEARCH can handle. I have other vaguer notions, too, such as how to request multiple levels of directories in a specifier, but I'll have to think about them awhile. Distribution and Copyrights ___________________________ Mat itself and this manual are copyright, but may be freely distributed without charge. Commercial use is prohibited without the express written permission of the author. The matching algorithm code is public domain. It is an extension of an algorithm in an original article by Martin Richards in "Software Practice and Experience" 1979. The source is in fairly generic 'C'; it has only been compiled under Lattice 3.10, but should be readily transportable. Remarks and Suggestions to: Peter Goodeve 3012 Deakin Street #D Berkeley, Calif. 94705 %%%%%%%%%%%%